Outline:
install.packages("nlstools")
trying URL 'https://cran.rstudio.com/bin/macosx/contrib/4.1/nlstools_2.0-0.tgz'
Content type 'application/x-gzip' length 422224 bytes (412 KB)
==================================================
downloaded 412 KB
The downloaded binary packages are in
/var/folders/zs/2z4rlxv54pv7ml4jld2kccmc0000gn/T//RtmpBN2o67/downloaded_packages
library(nlstools)
'nlstools' has been loaded.
IMPORTANT NOTICE: Most nonlinear regression models and data set examples
related to predictive microbiolgy have been moved to the package 'nlsMicrobio'
Data from the moodle platform
with(labor, hist(labour, breaks = "fd"))
with(labor, hist(production, breaks = "fd"))
with(labor, hist(capital, breaks = "fd"))
with(labor, hist(wage, breaks = "fd"))
with(labor, plot(production, labour))
with(labor, plot(capital, labour))
with(labor, plot(wage, labour))
First model
\[ \text{labour} = \beta_0 + \beta_1\text{capital} + \beta_2\text{production} + \beta_3\text{wage} + \epsilon \]
m1 <- lm(labour ~ I(capital/1000) + I(production/100000) + I(wage/1000), data = labor)
summary(m1)
Call:
lm(formula = labour ~ I(capital/1000) + I(production/1e+05) +
I(wage/1000), data = labor)
Residuals:
Min 1Q Median 3Q Max
-1267.04 -54.11 -14.02 37.20 1560.48
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 287.7186 19.6418 14.65 <2e-16 ***
I(capital/1000) -4590.4910 268.9693 -17.07 <2e-16 ***
I(production/1e+05) 15.4005 0.3556 43.30 <2e-16 ***
I(wage/1000) -67.4190 5.0141 -13.45 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 156.3 on 565 degrees of freedom
Multiple R-squared: 0.9352, Adjusted R-squared: 0.9348
F-statistic: 2716 on 3 and 565 DF, p-value: < 2.2e-16
plot(m1)
Reference plots
set.seed(123)
n <- 1000
x <- rchisq(n, df =2 )
y <- 1 + 5*x + rnorm(n)
m2 <- lm(y~x)
plot(m2)
\[ \text{labour} = \alpha \times \text{capital}^{\beta_1} \times \text{production}^{\beta_2}\times\text{wage}^{\beta_3}\times\epsilon \] log-transformed model is given by the following formula
\[ \log(\text{labour}) = \gamma + \beta_1\log(\text{capital}) + \beta_2\log(\text{production}) + \beta_3\log(\text{wage}) + \psi \]
m3 <- lm(log(labour) ~ log(capital) + log(production) + log(wage), data = labor)
summary(m3)
Call:
lm(formula = log(labour) ~ log(capital) + log(production) + log(wage),
data = labor)
Residuals:
Min 1Q Median 3Q Max
-3.9744 -0.1641 0.1079 0.2595 1.9466
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.948540 0.569922 -1.664 0.0966 .
log(capital) -0.003697 0.018770 -0.197 0.8439
log(production) 0.990047 0.026410 37.487 <2e-16 ***
log(wage) -0.927764 0.071405 -12.993 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4653 on 565 degrees of freedom
Multiple R-squared: 0.843, Adjusted R-squared: 0.8421
F-statistic: 1011 on 3 and 565 DF, p-value: < 2.2e-16
plot(m3)
Verification of assumptions using lmtest package
lmtest::bptest(m3) ## heteroskedasticity test
studentized Breusch-Pagan test
data: m3
BP = 7.7269, df = 3, p-value = 0.05201
Verification of assumptions using olsrr package
ols_test_breusch_pagan(m3) ## test for Heteroskedasticity
Breusch Pagan Test for Heteroskedasticity
-----------------------------------------
Ho: the variance is constant
Ha: the variance is not constant
Data
---------------------------------------
Response : log(labour)
Variables: fitted values of log(labour)
Test Summary
-------------------------------
DF = 1
Chi2 = 19.48789
Prob > Chi2 = 1.012395e-05
Note that results from lmtest::bptest and ols_test_breusch_pagan differ. This is because these two functions does the test in a different way, i.e. ols_test_breusch_pagan uses thest that assumes error terms to be normally distributed, while lmtest::bptest does not assume that.
To estimate model with heteroskedastic-robust standard errors you may use lm_robust function
m1_r <- lm_robust(log(labour) ~ log(capital) + log(production) + log(wage),
data = labor, se_type = "HC2")
summary(m1_r)
Call:
lm_robust(formula = log(labour) ~ log(capital) + log(production) +
log(wage), data = labor, se_type = "HC2")
Standard error type: HC2
Coefficients:
Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF
(Intercept) -0.948540 0.83598 -1.13464 2.570e-01 -2.59055 0.69347 565
log(capital) -0.003697 0.03840 -0.09628 9.233e-01 -0.07913 0.07173 565
log(production) 0.990047 0.04745 20.86659 4.057e-72 0.89685 1.08324 565
log(wage) -0.927764 0.08758 -10.59346 4.837e-24 -1.09978 -0.75574 565
Multiple R-squared: 0.843 , Adjusted R-squared: 0.8421
F-statistic: 530.7 on 3 and 565 DF, p-value: < 2.2e-16
Do the same using sandwich package
coeftest(m3, vcov = vcovHC(m3, type = "HC2"))
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.9485402 0.8359804 -1.1346 0.2570
log(capital) -0.0036975 0.0384041 -0.0963 0.9233
log(production) 0.9900474 0.0474465 20.8666 <2e-16 ***
log(wage) -0.9277642 0.0875790 -10.5935 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Robust standard errors: clustered/panel data
summary(model_pzn)
Call:
lm(formula = price ~ flat_area + flat_rooms + individual + flat_furnished +
flat_for_students + flat_balcony, data = pzn_rent_subset)
Residuals:
Min 1Q Median 3Q Max
-5154.5 -277.8 -41.4 216.4 10214.7
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 542.7392 12.9917 41.776 < 2e-16 ***
flat_area 21.8874 0.3495 62.634 < 2e-16 ***
flat_rooms 57.0316 8.1317 7.014 2.42e-12 ***
individualTRUE 11.9591 10.7216 1.115 0.265
flat_furnishedTRUE 146.1789 10.6453 13.732 < 2e-16 ***
flat_for_studentsTRUE -165.6744 10.5976 -15.633 < 2e-16 ***
flat_balconyTRUE 67.5169 8.6081 7.843 4.67e-15 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 518 on 15665 degrees of freedom
Multiple R-squared: 0.4407, Adjusted R-squared: 0.4405
F-statistic: 2057 on 6 and 15665 DF, p-value: < 2.2e-16
We calculate standard errors using two formulas: HC2 and CR2.
model_pzn_hc2
Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF
(Intercept) 542.73918 22.7187480 23.889485 6.416171e-124 498.207816 587.27055 15665
flat_area 21.88742 0.9099827 24.052565 1.468118e-125 20.103746 23.67109 15665
flat_rooms 57.03157 14.6818633 3.884492 1.029658e-04 28.253427 85.80972 15665
individualTRUE 11.95911 10.4886922 1.140190 2.542244e-01 -8.599941 32.51815 15665
flat_furnishedTRUE 146.17888 11.7486837 12.442149 2.267245e-35 123.150100 169.20765 15665
flat_for_studentsTRUE -165.67437 9.3304040 -17.756398 7.432573e-70 -183.963037 -147.38570 15665
flat_balconyTRUE 67.51693 8.6929091 7.766897 8.539199e-15 50.477822 84.55603 15665
model_pzn_cr2
Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF
(Intercept) 542.73918 57.714476 9.4038658 9.754312e-07 416.364110 669.11426 11.48673
flat_area 21.88742 2.374966 9.2158866 2.302644e-06 16.631257 27.14358 10.52550
flat_rooms 57.03157 29.228271 1.9512470 7.525866e-02 -6.796511 120.85966 11.75884
individualTRUE 11.95911 27.635294 0.4327476 6.725041e-01 -47.928772 71.84699 12.61559
flat_furnishedTRUE 146.17888 28.815803 5.0728719 3.179829e-04 83.050263 209.30749 11.43720
flat_for_studentsTRUE -165.67437 20.779186 -7.9730924 3.327097e-06 -210.827418 -120.52132 12.29682
flat_balconyTRUE 67.51693 26.226266 2.5744010 2.401600e-02 10.503352 124.53050 12.24897
coeftest(model_pzn, vcov = vcovCL(model_pzn, cluster = ~ quarter, type = "HC2"))
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 542.739 57.715 9.4039 < 2.2e-16 ***
flat_area 21.887 2.375 9.2159 < 2.2e-16 ***
flat_rooms 57.032 29.228 1.9512 0.05105 .
individualTRUE 11.959 27.635 0.4327 0.66520
flat_furnishedTRUE 146.179 28.816 5.0729 3.963e-07 ***
flat_for_studentsTRUE -165.674 20.779 -7.9731 1.654e-15 ***
flat_balconyTRUE 67.517 26.226 2.5744 0.01005 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
if your data is cross-section data -> HC2 if your data is a time series data -> HAC2 (implemented in sandwich packace) if your data is clustered / panels data -> CR2